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Abstract 

Background: Hepatitis B virus (HBV) infection is one of the main human health problem and causes a large-scale 
of patients chronic infection worldwide.. As the replication of HBV depends on its host cell system, codon usage 
pattern for the viral gene might be susceptible to two main selections, namely mutation pressure and translation 
selection. In this case, a deeper investigation between HBV evolution and host adaptive response might assist 
control this disease. 

Result: Relative synonymous codon usage (RSCU) values for the whole HBV coding sequence were studied by 
Principal component analysis (PCA). The characteristics of the synonymous codon usage patterns, nucleotide 
contents and the comparison between ENC values of the whole HBV coding sequence indicated that the 
interaction between virus mutation pressure and host translation selection exists in the processes of HBV evolution. 
The synonymous codon usage pattern of HBV is a mixture of coincidence and antagonism to that of host cell. But 
the difference of genetic characteristic of HBV failed to be observed to its different epidemic areas or subtypes, 
suggesting that geographic factor is limited to influence the evolution of this virus, while genetic characteristic 
based on HBV genotypes could be divided into three groups, namely (i) genotyps A and E, (ii) genotype B, (iii) 
genotypes C, D and G. 

Conclusion: Codon usage patterns from PCA for identification of evolutionary trends in HBV provide an alternative 
approach to understand the evolution of HBV. Further more, a combined selection of mutation pressure with 
translation selection on codon usage might shed a light on understanding the evolutionary trends of HBV 
genotypes. 
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Introduction 

Hepatitis B virus (HBV) disease is one of the main glo- 
bal health problems that two billion people are infected 
and 350 million people undergo chronic infection as 
well [1]. HBV belongs to the protyotype member of the 
family Hepadnaviridae, and has a compact and circular 
DNA genome of about 3.2 kb in length, with four over- 
lapping open reading frames including large S region 
(PreS/S), PreC/C, x and P [2,3]. Moreover, the overlap- 
ping regions on the genome are helpful to study the 
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evolution of the virus with its point mutations, because 
the incidence of recombination is rare and any point 
mutation could effect the genetic characteristics of two 
overlapped genes [3]. The evolution of HBV should be 
interactional and constrained by the overlap of genes 
[4]. In some cases, the evolution of one overlapping- 
gene protein may evolve more rapidly as a consequce of 
negative selection to the other, [5]. And the overlapping 
genes might be subject to different selections [6]. 
Furthermore, independent adaptive selection for both 
overlapping genes has been reported [7]. One of the 
main features of HBV are its genetic heterogeneity [8]. 
There are four main subtypes, namely ayw, adw, adr 
and ayr [9]. According to phylogenetic analysis of the 
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complete HBV genomic sequence, 9 genotype of HBV 
from genotype A to I have been determined and divided 
into approximately twenty- five subgenotypes [10-14]. 
HBV genotypes show distinct geographical distributions 
at the level of nucleotide different more than 8% each 
other [11,15,16]. It is noticed that nucleotide composition 
comprising of HBV coding sequence with various genetic 
diversities is selective rather than random, because the 
natural selection from host is responsible for selection of 
various strains shaped by mutation. In previous reports, 
translation selection and compositional constraints under 
the mutational pressure are thought to be the major fac- 
tors accounting for codon usage variation among gen- 
omes in microorganisms [17-24]. In some RNA viruses, 
compared with natural selection, mutation pressure plays 
a more important role in synonymous codon usage pat- 
tern [25,26]. Although it is known that compositional 
constraints and translation selection are the more gener- 
ally accepted mechanisms accounting for codon usage 
bias [27-30], other selection forces have also been pro- 
posed such as fine-tuning translation kinetics selection as 
well as escape of cellular antiviral responses [23,31-34]. 
Thus, the codon usage pattern may be important in dis- 
closing the molecular mechanism and evolutionary pro- 
cess of HBV to avoid host cell response. To our 
knowledge, it is the first systemic study to analysis the 
synonymous codon usage pattern and evolutional 
dynamics of HBV as well as the relationship between 
codon usage pattern of HBV and its host. 

Result 

Synonymous coodn usage in HBV 

The C% and U% were higher than A% and G%, and €3% 
and U3% were higher than A^X and 63% in HBV (Table 1). 

The overall nucleotide composition never affects the 
nucleotide contents in the third site of codon in HBV 
coding sequence, suggesting that composition con- 
straints may be one of the factors in affecting the codon 
usage pattern of HBV. For the synonymous codon usage 
pattern of HBV, the over-represented synonymous 
codons are rare in HBV coding sequence, only including 
UCU for Ser, in addition, the under-represented ones 
contain AUA for lie, CCC for Pro, ACC for Thr, GCC 
for Ala, CGU and CGG for Arg (Table 2). 

The codon usage bias of HBV suggests that some 
synonymous codons are not chosen equally and 
randomly. 

Genetic relationship based on synonymous codon usage 
in HBV 

The PGA detected the first principal component (fj') 
which can account for 23.65% of the total synonymous 
codon usage variation, and the second principal 



Table 1 The overall nucleotide contents and nucleotide 
contents at the synonymous third position of sense 
codons in the whole coding sequence of HBV 
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Table 1 The overall nucleotide contents and nucleotide 
contents at the synonymous third position of sense 
codons in the whole coding sequence of HBV (Continued) 
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component (/^O for 19.47% of the total variation. Based 
on the geographical factor in influencing HBV evolution 
potentially, there is an obviously geographical distribu- 
tion. For example, the overall codon usage pattern of 
HBV isolated from Philippines and South Korea is far 
from those of China and Indonesia, and the HBV iso- 
lated from Germany and Iran has a similar genetic 
diversity with that isolated from South Africa (Figure 1). 

Based on the subtypes of HBV, the plots for the sub- 
type adw were generally divided into two groups, while 
the other three subtypes seem to have a similar genetic 
characteristic (Figure 2). 

It is worth noting that the plots for different HBV 
genotypes were generally separated from each other. 
Moreover, the genotypes A and B have an obviously dif- 
ferent genetic characteristic with the rest, while geno- 
types C, D and G appear to have a relationship of 
evolution (Figure 3). 

These results indicated that the geographic distribu- 
tion might be a limited factor to effect the codon usage 
of the whole HBV coding sequence, and the subtypes 
did not reflect the characteristic of HBV evolution to 
some degree. In this case, the codon usage variation 
might be one of factors to drive HBV evolution. 

The effect of mutation pressure on codon usage of HBV 

To analyze if the evolution of HBV is shaped by muta- 
tion pressure from virus itself or by translation selection 
from host, G+C content at the first and second codon 
positions (GCi2%) was compared with that at synon- 
ymous third codon positions (GC3%) (Figure 4). 

A highly significant correlation was observed (r = 
0.432, P < 0.01), implying that mutation pressure from 
base composition of HBV is a main factor in shaping 
genetic diversity of this virus, since the effects are pre- 
sent at all codon positions. In addition, the ENC values 
were calculated for each strain and the plot was made 
by ENC value against GC3% (Figure 5). 

The Figureure 5 represented that the plots of HBV 
aggregated below the expected curve, suggesting other 
selections take part in the process of HBV evolution. 



Table 2 The relationship of the synonymous codon usage 



pattern between HBV and human cell 
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Table 2 The relationship of the synonymous codon usage 
pattern between HBV and human cell (Continued) 
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^ the synonymous codon usage pattern of human cell was calculated based 
on the data of the synonymous codon usage frequencies of human cell. 



Comparative analysis of the RSCU values between HBV 
and human cell 

There is a resemblance of synonymous codons usage 
pattern between this virus and human cell, for example, 
the similar synonymous codon usage pattern includes all 
synonymous codons for Phe, He, Val, Ser, Ala, Tyr, His, 
Lys, Asp, Cys and Gly (Table 1). This may be explained 
that the codon usage of HBV adapting to its host under 
translation selection could result in the multiplication of 
progeny virus. This phenomenon possibly implies that 
the resemblance of codon usage is favorable for HBV 
replication in human cells. But if compared with the 
under-represented codons in human cells, CCG for Pro, 
ACQ for Thr, CAA for Gin and CUA for Leu in HBV 
are highly used (Table 1). The result suggested that 
these codons could influence the translational rate of 
the context flanking them, resulting in the viral product 
correct fold. 
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Discussion 

The ENC values calculated for HBV indicated that 
although a significantly lower bias of codon usage exists 
in HBV, the codon usage is not mainly affected by 
mutation pressure. As for some viruses, previous study 
reported that the major factor in shaping codon usage 
patterns appears to be mutation pressure rather than 
natural selection [19,21,24,35]. However, the comparison 
of the synonymous codon usage between HBV and 
human cells suggested that the interaction of mutation 
pressure with translation selection exists in the process 
of HBV evolution, although ENC values for the whole 
HBV coding sequence to represent mutation pressure is 
one of the factors in influencing codon usage pattern. 
This characteristic of HBV confers adaptive advantages 
which result in a highly efficient dissemination of the 
virus through different ways of transmission. 

The pattern of codon usage is a genetic characteristic 
of various organisms in Previous study [19,20,27, 
31,32,35,36]. Because C%, U%, 1]^% and play roles 



in the formation of the different optimal codons with 
any nucleotide-ended, the codon usage pattern of HBV 
is likely influenced by composition constraints. The 
codon usage pattern of PV is mostly coincident with 
that of its host, while the codon usage pattern of HBV 
is antagonistic to that of its host [37,38]. The codon 
usage pattern of HBV is a mixture of the two types of 
codon usage. The coincident portion of codon usage 
pattern for HBV enables the corresponding amino acids 
to be translated rapidly, the other antagonistic portion 
of codon usage pattern likely enable viral proteins to be 
folded properly, although the translation efficiency of 
the corresponding amino acids is decreased. Latent 
genes in Epstein-Barr virus deoptimize codon usage in 
order to evade competition for host protein translation 
[28] and attenuation of PV activity was performed by 
rare codon pairs inducing poor translation for sequences 
of viral proteins [27]. These results suggested that disfa- 
vored codons coding for amino acids may not be a dele- 
terious factor for viruses to adapt to its host cells. 
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According to the data of codon usage pattern of HBV 
isolated from different countries, the geographic factor 
fails to influence the formation of codon usage pattern 
of HBV. After all, with development of international 
communication and highly efficient dissemination of 
HBV through various approaches of transmission, the 
affection of geographic factor seems to be weak on the 
limitation of HBV distribution in different countries. It 
is interesting that the main four subtypes of HBV have 
no significant difference in genetic characteristic shaped 
by different human races. This result might suggested 
that translation selection from human is not a single 
factor to shape the overall codon usage pattern of this 
virus and mutation pressure from HBV itself is a main 
force to drive HBV evolution. Genotyping of HBV is of 
high interest because there is increasing evidence that 
HBV genotypes may be associated with HBeAg sero- 
conversion rates, mutation occurring in the procure and 
core promoter region, severity of liver disease and treat- 
ment response [15,16,39,40]. There is a significant dif- 
ference of the overall codon usage pattern of HBV 
between genotypes A, B, E and C, D, G. HBV genotypes 



and subgenotypes have been associated with differences 
in clinical and virological characteristics, showing that 
they may play a role in the virus-host relationship [41]. 
It has been shown that genotypes C and D are asso- 
ciated with more serious liver injuries and with a higher 
incidence of HCC than genotypes A and B [42-44]. In 
addition, genotype C and D have a much lower rate in 
response to interferon therapy than those infected with 
A or B genotypes [40,45]. Moreover, subtle differences 
in frequency and type of lamivudine resistant variants 
occur in genotype A and D infectious [15]. An evolu- 
tionary approach to HBV infection, based on the princi- 
ples of natural selection, may offer explanation for how 
modes of transmission may favor some genotypes and 
subgenotypes over others and influence HBV virulence. 

The genetic diversity and codon usage patterns we 
proposed here are helpful to understand the processes 
of HBV evolution, especially the roles played by transla- 
tion selection from host and mutation pressure from 
virus. Additionally, such information might benefit to 
understand the roles of geographic and subtype factors 
in influencing the process of HBV evolution. 
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Materials and methods 

Sequence data 

The 58 complete RNA sequences of HBV were down- 
loaded from the National Center for Biotechnology 
Information (NCBI) http://www.ncbi.nlm.nih.gov/Gen- 
bank/ and detailed information about the viruses were 
listed in Table 3 

Each general nucleotide composition (U%, A%, C% and 
G%) and each nucleotide composition in the third site of 
codon (U3%, A3%, C3% and G3%) in HBV coding sequence 
were calculated by biosoftware DNAStar 7.0 for windows. 

The calculation of the relative synonymous 
codon usage (RSCU) 

The relative synonymous codon usage (RSCU) values for 
the whole 58 coding sequence of HBV were calculated 
as previously described [46]. RSCU values do not 
depend on the factors of amino acid composition and 
the size of the coding sequence, because the two factors 
can be eliminated in the process of calculation. When 
RSCU value is equal to 1.0, it means that this codon is 
chosen equally and randomly. The RSCU value for a 
synonymous codon more than 1.0 or less than 1.0 indi- 
cates the more frequency or less frequency, respectively. 
The synonymous codons with RSCU more than 1.6 
were thought to be over-represented, while the synon- 
ymous codons with RSCU less than 0.6 were regarded 
as under-represented [47]. 

Analysis of codon usage bias 

The 'effective number of codons' (ENC), the useful esti- 
mator of absolute codon usage bias, was a measure 
quantifying the codon usage bias of the whole coding 
sequence of HBV. The ENC value ranges from 20 
(when only one synonymous codon is chosen by the 
corresponding amino acid) to 61 (when all synonymous 
codons are used equally) [48]. In this study, this mea- 
sure was used to evaluate the degree of codon usage 
bias of coding sequences for HBV. 

Principal component analysis 

Principal component analysis (PCA), which was a com- 
monly used multivariate statistical method [24], was car- 
ried out to analyze the major trend in codon usage 
pattern among different strains of HBV. PCA involves a 
mathematical procedure that transforms some correlated 
variable (RSCU values) into a smaller number of uncor- 
related variables called principal components. Each 
strain was represented as a 59 dimensional vector, and 
each dimension corresponded to the RSCU value of 
each sense codon, which only included several synon- 
ymous codons for a particular amino acid, excluding the 
codon of AUG, UGG and three stop codons. 



Table 3 The information of IHBV strains in this study 

No. Accession No. fl^ ENC value 



1 


AF405706 


-0.79 


1.32 


56.41 


2 


X04615 


-0.82 


0.50 


55.88 


3 


AB033554 


-1.11 


-0.90 


55.78 


4 


AY741798 


-0.82 


1.31 


56.17 


5 


AY741797 


-0.82 


1.15 


55.82 


6 


AY741796 


-0.72 


1.23 


56.62 


7 


AY741795 


-0.75 


1.26 


56.59 


8 


AY741794 


-0.73 


1.26 


56.61 


9 


AF 100309 


-1.02 


-1.17 


55.92 


10 


M57663 


0.87 


-1.05 


55.48 


11 


API 00308 


-1.16 


-1.69 


55.70 


12 


U87747 


-0.38 


-0.96 


57.29 


13 


U87746 


0.49 


-0.27 


55.71 


14 


AYl 23041 


-0.69 


0.77 


55.94 


15 


AF068756 


-0.48 


0.70 


56.39 


16 


AF282918 


-0.84 


-1.22 


55.98 


17 


U95551 


-0.99 


0.62 


56.36 


18 


GQ872210 


-0.02 


1.01 


56.07 


19 


GQ161818 


0.54 


0.11 


56.88 


20 


GQ161805 


0.56 


0.08 


56.87 


21 


GQ161799 


0.56 


0.11 


56.88 


22 


AY796032 


-0.49 


1.42 


56.08 


23 


AY796031 


-0.43 


1.17 


56.08 


24 


AY796030 


-0.47 


0.68 


56.67 


25 


AF282917 


-1.07 


-1.45 


55.70 


26 


AY233296 


-0.07 


1.39 


55.62 


27 


AY23329 


-0.38 


1.30 


56.04 


28 


AY233294 


-0.33 


1.62 


55.95 


29 


AY233293 


-0.39 


1.51 


55.92 


30 


AY233291 


-0.45 


1.29 


55.95 


31 


AY233290 


1.42 


0.25 


56.75 


32 


AY233289 


1.57 


-0.49 


56.66 


33 


AY233288 


1.39 


-0.33 


56.84 


34 


AY233287 


1.55 


-0.14 


56.82 


35 


AY233286 


1.03 


0.00 


56.78 


36 


AY233285 


1.26 


-0.54 


56.52 


37 


AY233284 


1.38 


-0.24 


56.78 


38 


AY233283 


1.49 


-0.45 


56.54 


39 


AY233282 


1.35 


-0.17 


56.73 


40 


AY233281 


1.31 


-0.08 


56.95 


41 


AY233280 


1.19 


0.18 


56.82 


42 


AY233279 


1.34 


0.04 


56.90 


43 


AY233278 


0.86 


-0.56 


56.37 


44 


AY233277 


1.55 


-0.15 


56.88 


45 


AY233276 


1.38 


-0.38 


56.83 


46 


AY233275 


1.87 


0.03 


56.79 


47 


AY233274 


1.34 


-0.30 


56.60 


48 


AY233273 


-0.49 


-0.80 


56.45 


49 


DQ448628 


-1.07 


-1.31 


55.84 


50 


DQ448627 


-1.07 


-1.56 


55.84 


51 


DQ448625 


-1.07 


-1.56 


55.68 
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Table 3 The information of IHBV strains in this study 

(Continued) 



52 


DQ448623 


-1.07 


-1.34 


55.76 


53 


DQ448622 


-0.81 


-1.44 


55.90 


54 


DQ448621 


-0.94 


-1.10 


56.24 


55 


DQ448620 


-1.02 


-1.46 


55.77 


56 


DQ448620 


-0.90 


-1.29 


56.01 


57 


AY373432 


-0.73 


1.26 


56.61 


58 


AY373430 


-0.93 


0.82 


55.86 



^ f'1 and f2, respectively, were calculated by PCA method. 



Correlation analysis 

The relationship between each general nucleotide com- 
position (U%, A%, C% and G%) and each nucleotide 
composition in the third site of codon (1)3%, A3%, C^X 
and G3%) in HBV coding sequence and the relationship 
between 1/3%, A^%, €3%, 63% and the coodn usage pat- 
tern of HBV were evaluated by the Pearson's rank. 

All statistical processes were carried out by statistical 
software SPSS 11.5 for windows. 
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